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d Abstract 
-t— > 

^ i Conditions are presented for local identifiability of discrete undi- 

rected graphical models with a binary hidden node. These models can 
be obtained by extending the latent class model to allow for conditional 
associations between the observed variables. We establish a necessary 
and sufficient condition for the model to be locally identified almost 
everywhere in the parameter space and we provide expressions of the 
subspace where identifiability breaks down. The condition is based 
q>^ on the topology of the undirected graph and relies on the faithfulness 

f^*) assumption. 

o 
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1 Introduction 

In this paper we focus on undirected graphical models for discrete variables 
with one binary latent variable. These models generalize the latent class 
model, by allowing associations between the observed variables conditionally 
on the latent one. Allowing conditional associations between the observed 
variables in a latent class model is an alternative approach to add latent 
classes; see [H]. The practical importance of this issue is witnessed by 
several applied papers; see e.g. [7], [S] and [2T] . 
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In the recent literature the algebraic and geometric features of Bayesian 
networks with hidden nodes have been studied, see e.g. [HI HS1 EH As 
mentioned in these papers, when some of the variables are never observed, 
non-identifiability and local maxima in the likelihood function can occur 
and the dimension of the model is not easily computed. Furthermore, it has 
been shown in [10] that undirected discrete graphical models with no hidden 
variables are linear exponential family, while latent class models and more 
generally directed discrete graphical models with hidden nodes are members 
of the stratified exponential family, for which standard asymptotic results 
do not apply. 

We concentrate on the identification of discrete undirected graphical 
models with one unobserved binary variable and establish a necessary and 
sufficient condition for the rank of the parametrization to be full. This en- 
sures local identification of this class of models. The condition is based on 
the topology of the undirected graph associated to the model and relies on 
the faithfulness assumption. For non-full rank models, the obtained char- 
acterization allows us to find the expression of the (sub)space where the 
identifiability breaks down. Geometrically, this corresponds to the singular- 
ities in the parameter space [U [TUl [22] . 

The non-identifiability issue mentioned above has considerable repercus- 
sions on the asymptotic properties of standard model selection criteria (e.g. 
likelihood ratio statistic and other criteria, such as BIC), whose applicability 
and correctness may no longer hold. As stressed in [61 [TUl [22] even when 
singularities are removed, the standard asymptotic tools may still be inap- 
propriate for model selection of Bayesian networks with hidden nodes. In 
particular, in [22j an adjusted BIC score for naive Bayesian networks with 
one hidden variable is presented, with the correction depending on the types 
of singularities of the sufficient statistics of the postulated model. Large sam- 
ple distributions of the likelihood ratio test with reference to singularities 
are studied in [6]. 

In Section 2 the model is presented, while in Section 3 the main deriva- 
tions are detailed for the binary case. In Section 4, we extend the results to 
more complex models while in Section 5 we present our conclusions. 

2 Identification of the discrete undirected graphi- 
cal model 

Let G K = (K, E) be an undirected graph with node set K = {0, 1, . . . , n} 
and edge set E = whenever vertices i and j are adjacent in G , 
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< i < J < n. To each node v is associated a discrete random variable 
A v . A discrete undirected graphical model is a family of joint distributions 
of the variables A v , v G K, satisfying the Markov property with respect 
to G K , namely that A v and A u are conditionally independent given all the 
remaining variables whenever u and v are not adjacent in G K ; see |13j (Ch. 
3) for definitions and concepts. 

Let Ao be a binary unobserved variable and O = { 1, . . . , n} be the set of 
nodes associated to observed variables A v with v G O. In the following we 
let G B to indicate the (sub)graph G B = (B, E B ) of G K induced by B C K. 
We denote with G B = (B, Eb) the complementary graph of the (sub)graph 
G B , where Eg is the edge set formed by the pairs (i, j) G" Eb with i,j G B 
(i 7^ j). In Figure Qb) the complementary G° corresponding to the graph 
G K of Figure [l|a) is presented. We later prove that the corresponding 
graphical model is locally identified. 

Let l v indicate the number of levels of A v , v G K, and let I = Y\y =1 lv- 
Without loss of generality we assume that the variable A v takes value in 
{0, . . . , l v — 1}. We consider the multidimensional contingency table obtained 
by the cross classification of N objects according to A v . Let X be the 21 x 1 
vector of entries of the contingency table, stacked in a way that Aq is running 
the slowest. 

Data for contingency tables can be collected under various sampling 
scheme (see [T3], Ch. 3). We assume for now that the elements of X are 
independent Poisson random variables with E(X) = fix- Let log fix = Z/3, 
where Z is a 21 xp design matrix defined in a way that the joint distribution of 
A v , v G K, factorizes according to G K and such that the model is graphical; 
f3 is a p-dimensional vector of unknown parameters. We adopt the corner 
point parametrization that takes as first level the cell with A v = 0, for all 
v G K, see e.g. [1]. We denote by Y the I x 1 vector of the counts in 
the marginal table, obtained by the cross classification of the iV objects 
according to the observed variables only. The vector Y is stacked in a way 
that Y = LX, with L = (1, 1) (g> I[. By construction, the elements of Y are 
independent Poisson random variables with fiy = 

Let Py(u, •) be the joint probability distribution of Y. Let £1 be the 
parameter space. A (parametric) model is globally identifiable if there exist 
no two distinct parameter values f3,/3° G such that Py(y,/3°) = Py(y, P) 
(see PQ). A (parametric) model is locally identifiable in /3° if there ex- 
ists an open neighborhood of [3 containing no other f3 G such that 
Py(y,l3 ) = Py(y,P)- If this condition is true for all /3 G Q the model 
is locally identified. Global identifiability of a model implies local identifia- 
bility, but not vice versa. 
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(ta) 

Figure 1: Example of (a) a G K graph and (b) the corresponding graph G° 

We recall that, by the inverse function theorem, a model is locally identi- 
fied if the rank of the transformation from the natural parameters jiy to the 
new parameters /3 is full (see also In this context, this is equivalent 

to the rank of following derivative matrix 

being full, where R = diag(/xx). Note that the (£, j)-th element of D(f3) is 
the partial derivative of the i-th component of [iy with respect to f3j the j-th 
element of j3. The multinomial case can be addressed in an analogous way 
to the Poisson, after noting that the rank of the matrix D(/3) is equivalent 
to that of its submatrix Do({3) obtained by deleting the last column. 

Note that, by setting t* = for any parameter j3j, the parametrization 
map turns into a polynomial one. This implies, see e.g. [1 H I16j . that if there 
exists a point in the parameter space of tj, and therefore on at which the 
Jacobian has full rank, then the rank is full almost everywhere. Therefore, 
either there is no point in the parameter space at which the rank is full, or 
the rank is not full in a subspace of null measure. The object of this paper 
is (a) to establish a necessary and sufficient condition for the rank of D{f3) 
to be full almost everywhere (b) to provide expressions of the subspace of 
null measure where identifiability breaks down. 

3 Local identification with binary variables only 

In this section we consider graphical models for binary variables only and 
assume that all n observed variables are connected to the unobserved one 0, 
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Figure 2: Two examples of G graphs 

i.e. (u, 0) G E, for any observed variable u G O. Both assumptions will be 
relaxed in Section 5. 

In this paper we assume that the graph is faithful. This implies that 
for each complete subgraph G s = (S,Es), SCO, (a) if \S\ = 1 there is 
a non-zero main effect of As and a non-zero second order interaction term 
between As and Aq; (b) if \S\ > 1 there is a non-zero interaction term of 
order IS"! among the variables in S and of order |5| + 1 among the variables 
in {0,S}. 

Let t be the maximum order of the non-zero interaction terms among the 
variables in O. For each order k, k G {2, . . . ,t}, of interaction between the 
observed variables, let be the number of interaction terms of order k. We 
use Ik t r to denote the set of vertices in O having a non-zero r-th interaction 
term of order k, r G {1, . . . , s^}- Note that, by construction, \Ik, r \ > 1- The 
following example clarifies the notation. 

Example 1. The model with graph G K as in Figure [2] (a) has maximum 
order t = 2 and S2 = 2 with /2,i = I2.2 = {2,3}. The model with 

graph G K as in Figure § (b) has maximum order t = 3. For k = 2, S2 = 3 
with ^2,1 = {1)2}, h,2 = {2,3} and h,3 = {1,3}; for k = 3, S3 = 1 with 
/ 3il = {1,2,3}. 

Consider I C O, let yuj be element of [iy associated to the entry of the 
contingency table having value zero for all variables except the variables in 
/. Let di be the row of the matrix D{f3) corresponding to the first order 
partial derivative of \x\ with respect of (3. Note that /3 V , v G K, represents the 
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main effect of the random variable A v and for each subset I Q O such that 
|/| > 1, Pi is the interaction term between the variables in /. With (3{o,i} 
we denote the interaction term between the variables in {0,/}. Moreover, 
/?0 = \x is the general mean. With reference to the Example 1, let I = {1,2}. 
Then [ii is the expected value of the entry (1, 1, 0, 0) , di is the row of D((3) 
corresponding to the partial derivative of \ii with respect to (3 and (3i is the 
term expressing the second order interaction between A\ and A2. 

With this notation, to each generic zj-element of D((3), we can associate 
the set I, I Q O, of the observed variables taking value one in row i; the set 
J, J C K, of variables of which (3j represents either the main effect or the 
interaction term. Note that both / and J could be the empty set. Let Z{ be 
the i-th row of Z. It is then easy to see that this generic ij-element is if 
J I; it is equal to e Zi ^ if G J and e Zi @ -\-e Zi+l13 , otherwise. Furthermore, 
let S be a complete subgraph of G° and S' D S. For d$ and dg> and (3s 
and /?{o,s} the 2x2 square sub-matrix of D((3) has the following structure: 

e a {l + e b ) e a+b 1 

„a+a'( 1 1 e b+b'\ e a+a'+b+b' \ Z > 



with a = fi + YIiqsPii b = ^0 + E/cs^o,/}- a ' = E{/cs',/25} (5 ( 7 )^ and 
b' = Yl{ics' igs} fi(I)P{o,I}> where 5(1) = 1 if J is complete on G° and 
otherwise. Matrix ^ is not full rank if and only if b' = 0. 

We first consider the latent class model, i.e. a model which encodes the 
following conditional independencies: for each pair of variables u,v E O, 
A U ALA V I Aq. The following proposition is a restatement of a well-known 
result. 

Proposition 1. A latent class model with Aq a binary variable is full-rank 
everywhere in the parameter space if and only if n > 3. 

Proof. See [El H]. 

The (unidentified) case of n = 2 is addressed in [5] ; see [TOj, [181 EH E2] for 
the Bayesian network literature. 

We now remove the assumption that the observed variables are inde- 
pendent conditionally on the latent one, to include a more general class of 
graphical models G K over the A v variables, v G K, such that (0, u) G E 
for all u G O. We first consider graphical models, whose associated comple- 
mentary graphs G° are connected and have at least an m-clique C where 
m > 3; we let C = O \ C . The model in Figure 1 is an example of such a 
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model. The idea is to show that an ordering of the variables in C exists such 
that, after re-ordering the rows of D{f3) accordingly, D{f3) admits a lower 
block-triangular square sub-matrix with full rank. Since G° is connected 
for all j G C there exists a path that connects j with at least a node of C. 

The following algorithm provides the ordering of the variables in C. 

Step 1.U4-C 

Step 2. T <- U and U <- 0. 

Step 3. Check if T is empty, in which case C is ordered; otherwise search 
the farthest node j G T from C, i.e. with the node with the highest number 
of edges in the path going from j to C; if there is more than one node choose 
any one; 

Step 4. Let Jj be the intersection of the set T and the set of nodes inside 
the path from j to C; order Jj starting from j; let T •<— T \ Jj. 
Step 5. If the last node of Jj is connected to a node of C, then append Jj to 
£/ as the last group of elements (so U <— U\J Jj); otherwise, if the last node 
of Jj is connected to some node i G" T order Jj before i in U; go to Step 3. 

Lemma 1 . Let G K be an undirected graphical model over the binary variables 
{Aq, A\, . . . , A n ) with Aq unobserved and with (0,u) G E, for all u G O. 
Assume that in G° there exists an m-clique C , m > 3. Let C = {O \ C} 
and Mi be the sub-matrix of D((3) formed by the rows di and dujy, with 
i G C and j such that G E, and by the columns Pi and /3{o,i}- Then M\ 
has rank equal to 2|C| everywhere in the parameter space if G° is connected. 

Proof. If G° is connected, there exists an ordering (see the previous 
algorithm) of the nodes of C such that for any i, 1 < i < \C\, the node 
j = i + 1 is such that G E. Such ordering generates \C\ pairs. Let Mj" 
be the sub-matrix of M\ made up of the rows di, du n, 1 < i < \C\. Then 
M* is a 2|C|-square lower-block triangular matrix with blocks M l associated 
to row di, d^ jy, and columns /3, and P{o,i}- The structure of M % is as Q 
with a = (/, + Pi, b = Po + P{o,i}, o! = Pj and 6' = Pxqj} since by construction 
(i,j) G -E. As /3{o,j} 7^ by the faithfulness assumption, it follows that Mj" 
is full rank and so is M\. 



Lemma 2. Let G K be an undirected graphical model over the variables 
{Aq, A\, . . . , A n ) with Aq unobserved and with (0,u) G E, for all u G O. 
Let Ifc jr be a complete subgraph of G° with k>2. Suppose that there exists 
a sequence {Is}1=o> Q > 0, °f complete subgraphs of G° such that Lq = 
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Ik,r, Is 7^ I s ', 8 7^ s' , with s,s' G {0, . . . , q + 1} satisfying the following 
assumptions: 

(a) for all s G {0, . . . , q} and for all i G I s there exists j G I s +i such that 

(b) for all s G {0, . . . , q} , |/ s | = k and \I g +i\ < k; 

then D(/3) contains at least one square sub-matrix M^^ of order 2(q + 1) 
formed by the rows dj s and diyjx, V Q (-^s+l \ Is), an d by the columns 
associated to f3i s and /3{o,/ s } ^ s 6 {0) ■ ■ ■ ,q}, that has full rank everywhere in 
the parameter space. 

Conversely, if D(f3) is full rank everywhere in the parameter space, then 
for any complete subgraph Ik, r of G° with k > 2 there is a sequence of 
complete subgraphs satisfying the assumptions (a) — (b). 

Proof. We prove the sufficiency first. Consider all the sub- matrices of 
Mfc r . Observe that a row, and therefore a column, cannot be chosen twice 
in a Mk )T matrix, as I s ^ I s i. By ordering the rows and columns according 
to the sequence of {I s }, the matrix M& r is seen to be lower block triangular. 
The blocks are Nq, . . . ,N q where N s is formed by the rows dj g and d{v,i 3 } 
and by the columns associated to f3j s and /3{o,/ s }- Therefore N s is as ([2j). 
Then, rank(Mfc jr ) = YH=o rank(iVs) and is full if and only if the blocks are 
full rank, that is if the rank of each block is equal 2. Suppose that there is 
an index s such that no block N s has full rank, for all choices of V C I s+ i. 
Then, from ([2]), there exists a strictly positive constant < h < 1 such that 

E Wi+P{o,i}) E Pi ( E /Vo 
e ici s _ he ici s 1 + e ici s 



E 5(Wi+P{o,i}) E S(I)0i ( E 5(I)f3 {0J ^ 

,ici s uv _ he^'^ v 1 + e l &> uv ) for all V C 



where (5(7) = 1 is 1 if I is complete in G° and otherwise. The previous 
system implies 

£ /%>,/} = A 

ICI„ 

£ <W{o,/} = for all y C J a+1 . 

^ /C{/ s UV},ig/ s 
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From the fact that the model is graphical we obtain: 

£ /Wi = ln ih 

(3) 

£ /5{oy,/} = for all V C / s+1 

where i] 7 = n je y{i G 7 S : (i, j) G E 1 } is the intersection, for varying j G V, 
of the subsets of nodes in I s connected to j in G° . Note that by (a) for 
V = I s +i, Is s+1 = 0- This implies that /?{o,v} = 0, which contradicts the 
faithfulness assumptions since I s +\ is a complete subgraph of G° . Therefore 
there exists for each s a full rank block N s and the square sub-matrix M^^ 
is full rank everywhere in the parameter space. 



We now prove the necessity. Since D(/3) is full rank everywhere, the sub- 
matrix of D{j3) formed by all rows of D (f3) and by the columns (3i k r , fi{o,i k r ) 
is full column rank for all /3 G CI. 

Note that if there exists a sequence of complete subgraphs satisfying 
(a), but such that for some s G {1, ... ,q} \I S \ > k, then there exists also 
a sequence satisfying \I S \ = k: in fact, if for i G I s there exists a j G I s +i 
such that E, then I s+ i can be chosen in a way that |/ s +i| cannot be 

greater than |/ s |. Therefore, either there is no sequence of I s such that (a) 
is satisfied or there is no I q+ \ such that < k. 

Going by contradiction, suppose that for = Iq there is no complete 
subgraph I\ in G° such that for each i G Iq there is j G I\ with S" E. 
Select the sub-matrix Ck )T formed by the columns (3j , /3{o,/ } anc ^ au * ne 
rows such these two columns have non-zero components, that is select all 
rows dy, V 5 -^o- (Note that in all other rows the two elements are both 0). 
Denote with Qk,r C Q the following subspace: 

£ ^{o,/} = ln ^h 

ICI S 

(4) 

P{0,Vo} + £/c/ v « P{OJ,Vo} = 
v — o 

where Vq is any complete subgraph in G° such that for each j G Vq there is at 
least a i G Iq with G E and lY° = Pljevj,!? G Jo : (hj)€ E}. Violation 
of assumption (a) implies that I Q ° / 0. Then, it is easy to verify that for 
(3 G flk,r the columns of Cfc >r are linearly dependent. In fact, consider system 
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Q where I s = Iq. System Q has solution for /3 G £lk,r- This contradicts 
the assumption that D{f3) is full rank everywhere. 

Suppose now that there exists a Iq = Ifc jr such that there is no sequence 
for Iq such that < k. We should find a square sub-matrix of D{j3) 

with full rank having columns associated to j3i s and /3{o,/ s }> s G {0, . . . , q}, 
i.e. such that the columns associated to f3j s and /3{o,/ s } are linearly inde- 
pendent. From the previous derivations, we should consider the rows asso- 
ciated to I s and Vs = {I s ,I s +i} (otherwise, the system Q has solution for 
P G &k,s)- But, as there is no I q +\ such that < k, I q+ \ coincides with 

some I s in the sequence. Therefore, we cannot built a sub- matrix Ck with 
full rank. 



Obviously, the sequence of complete subgraphs satisfying the assumptions 

(a) and (6) in Lemma 2 is not necessarily unique. 

With reference to Figure [TJ let I = {1,2}. The square sub-matrix with 
rows di and g?{4,/}, and columns /3j and /3{o,7"} is full rank, as the sequence 
Iq = {1, 2}, I\ = {4} satisfies the assumptions of Lemma 2. Let / = {2, 3}. 
The square sub-matrix with rows di and d{5,/} and columns /3/ and /3{o,/} is 
also full rank, as the sequence Iq = {2, 3}, I\ = {5} satisfies the assumptions 
of Lemma 2. The same holds for Iq = {3, 4} and Iq = {4, 5}. 

Remark 1. If there is a sequence satisfying the assumptions (a) and 

(b) but I s = I s i for some s ^ s' ,s < s' , then there is also a shorter se- 
quence satisfying the same assumptions, which is constructed by excluding 
the interactions from / s +i, . . . , I s '. 

Remark 2. An equivalent formulation of the assumption (6) of Lemma 2 
is that for s G {0, . . . , q} and for all i G I s there exists j G I s +\ such that i 
and j are connected in the complementary graph G . 

Remark 3. The fact that the assumptions (a) — (b) of Lemma 2 hold for 
all complete subgraphs I2.T of G° with |/2 ir | = 2, r = 1, . . . , S2, does not 
imply that they hold also for the complete subgraph v = 1, . . . , s& of 
G° such that I^ v D i2 r- As a matter of fact, consider the complementary 
graph (5° of Figure [3J We can see that conditions (a) — (6) hold for each 
complete subgraph of G° containing two nodes. In particular for Iq = {1,4} 
we have I\ = {2} and for Iq = {3,4}, we have I\ = {2}. However, the set 
/ = {1,4,5} does not verify the condition (6), as bd{I) = {2,3,6}, with 



10 



Figure 3: The complementary graph G° of Remark 3 



{2, 3, 6} a complete subgraph in G° with no single node in {2, 3, 6} adjacent 
to all node in /. 

Suppose that for each fixed order k of interaction, k G {2, ...i}, the 
sets Ik : n r = 1) ■ ■ • j Sfc, satisfy the assumptions of Lemma 2. For each 1^^ 
then there is a full rank sub-matrix Mk jT of D((3) with rows d/ s dryjA, 
F C 7 s+ i, and columns (3j a and /3{o,/ s }, s G {0, ...g 1 }. We denote with 
Pfc the matrix formed by all rows of D(/3) and columns used to build all 
the matrices Mfc jr ,r G {1, . . . , s^}. By construction, a row, and therefore a 
column, cannot appear in more than one Mfc ;J .. Then, P& is a sub- matrix 
of -D(/?) which is full column rank as it is block-triangular matrix with 
full-rank blocks M^ r . In fact, the matrix has zero components in the 
columns associated to j3n ,\ and /3r / ,\ for r' ^ r, so Pk is a lower block- 
triangular matrix with blocks full rank everywhere in the parameter space, 
and is therefore full rank for all j3 G Q. The following Lemma then holds. 

Lemma 3. Let P = [P2I • • ■ \Pt] be the sub- matrix of D(/3), with P k , k G 
{2, . . . constructed as previously described. Then P is full column rank 
everywhere in the parameter space. 

Proof. From the fact that the model is graphical, P is lower block- 
triangular matrix, as if /?/ = then fir = for all I' D I. The blocks are 
full column rank everywhere in the parameter space. 
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Remark 4- The assumptions of Lemma 3 imply that the complementary- 
graph G° is connected. In fact, let G° be a complementary graph with two 
connected components and consider any set formed by two nodes not belong- 
ing to the same connected component. A sequence with = 1 
cannot exist, because the nodes in I s (for s G {0, ...,q}) belong to two dif- 
ferent components of G° and the intersection between the neighborhood of 
i, j G I q in G° is the empty set. 

The next proposition leads to the same conclusion; the proof allows to 
determine the subspace where the rank of the parametrization is not full. 

Proposition 2. Let j3 be the vector of the parameters of an undirected 
graphical model G K over the binary variables (A$, A\, . . . , A n ), with A$ un- 
observed and (0,u) G E, for all u G O. If G° is not connected, then D{f3) 
has no full rank everywhere in the parameter space. 

Proof. If G° is not connected, then G° has two or more connected 
components. Let G 1 = (y±,Ei) and G 2 = (V^,^) be two of them. For any 
i € Vi, consider any complete set Ii in G° of nodes adjacent in G° to i. 
For any j G V2, (i,j) G E and (u, j) G E for any u G Jj.Let Oj C be the 
subspace of SI such that 

0{OJ} + E P{o,JJ} = °- ( 5 ) 
jc/, 

Then, the rank of D is not full for /3 G 





We can then prove the following: 

Proposition 3. Let (3 be the vector of the parameters of an undirected 
graphical model G K over the binary variables {Aq, A\, . . . , A n ), with Aq un- 
observed and (0, u) G E, for all u G O. A necessary and sufficient condition 
for D(f3) to be full rank everywhere in the parameter space is that: 

(i) G° contains at least one m-clique C, with m > 3; 

(ii) if for each complete subgraph Iq of G° , there exists a sequence 
of complete subgraphs in G° such that: 
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(a) for s G {1, •••,<?} |/ s | = |/ | and \I q+1 \ < \I \; 

(b) for s G {1, . . . , q} and for all i £ I s there exists j G / s +i smc/i i/iai 
i and j are connected in the complementary graph G° . 

Proof. We prove the sufficiency first. Let Dc be the sub-matrix of D(/3) 
with rows corresponding to the all cells with values zeros for all variables not 
in C, and columns fi, fy, P{o,i}i i €. C. By (i) and Proposition 1, Dq is full 
column rank. Let D c be the sub-matrix of D(f3) having rows d{ and 
columns Pi,P{o,i}, i £ C and j such that G E. From (ii) and Lemma 
2, is full column rank. The matrix D(j3) can be so written: 

D c " 

where Si, i?2 and -B3 are non-zero matrix (we omit the dimension for 
brevity), while P is as in Lemma 3. Therefore, D(/3) is full rank every- 
where. 

To see the necessity, note that D{(5) is not full rank only if one of the 
following matrices Dq, Dq and P is not full rank. Proposition 1 implies 
that assumption (i) is also necessary for Dc to be full rank. From Lemma 
2 and 3, assumption (ii) is also necessary for D c and P to be full rank for 
all p G n. 

The following Lemma is a restatement of the assumptions of Proposition 3 
in terms of the cliques of the subgraph G° . 

Lemma 4. A restatement of the assumption (ii) of Proposition 3 is the 
following: 

(ii 1 ) for each clique Co in G° there exists a sequence in G° {Ss} q s=l of 
complete subgraphs such that 

(a) \S S \ < l^s-il for s G {1, . . . , q — 1}, So = Co and \S q \ = 1; 

(b) for sG{l, ...,(? — 1} and for all i G S s there exists j G S s+ i such 
that (i,j) G E. 

Proof. It is immediate to see that (ii) implies (ii)'. The proof of the 
inverse implication is the following. For S = Co it is trivial. For 5 C Co 



D((3) = 
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consider the following restriction on the sets S\ , . . . , S q in the sequence for 
Cq: let Iq = S and, for i G {1, q'}, let Jj be the set of nodes v G Si such 
that there exists j G Jj_i with (j, v) £ E and such that the cardinality of /j 
is not greater than \S\ (see the proof in Lemma 2). The existence of Iq'+i 
with < | -SI follows from \S q \ = 1. 

<> 

Proposition 3 implies that only the models with connected complemen- 
tary graph can be identifiable. This contrasts with the condition of globally 
identifiability in graphical Gaussian models given in [201 [23] . The two con- 
ditions coincide only in the case with n = 3 or n = 4. In this second case, 
an identified model (under both the discrete and Gaussian distribution) has 
conditional independence graph as in Figure [2] (a) . The model associated 
to Figure [2] (b) is an underidentified one, as it violates assumption (i) of 
Proposition 3. 

Example 2. Consider the model with graphs G K and G° as in Figure 
0(a) and (b). The clique of G° are {1,2}, {2,3}, {3,4}, {4,5}. Setting 
Iq = {1,2} we find that I\ = {5}, that is I\ is formed by just one element. 
The same holds for the other cliques. 

Example 3. Let the cliques in the graph G° are C\ = {1,4, 7,9}, C2 = 
{1,4, 6, 9}, C 3 = {1,4, 6, 8}, C 4 = {2, 4, 7, 9}, C 5 = {2, 4, 6, 9}, C 6 = {2, 4, 6, 8}, 
C 7 = {1,5,7,9},C 8 = {2,5,7,9},C 9 = {3,5,8},Cio = {3,6,8}, C u = 
{1, 5, 8}, C\2 = {2, 5, 8}, Cx3 = {3, 5, 7}. In Figure|4[the corresponding graph 
G° is represented. We can verify from the graph G° that the assumptions 
of the Proposition 3 hold. For example, for the interaction term involving 
Io = {1,5,8} we have the identifying sequence: I\ = {2,4, 9}, I2 = {3}. 
By considering Iq = {1,4,6,8} we have the identifying sequence I\ = 
{3,7},/ 2 = {4,6} and/ 3 = {5}. 

Example 4- In Figure [5] the complementary graph G° associated to an 
unidentified model is presented. The condition (ii) of Proposition 3 does 
not hold for {4,8} (as well as {4,9}, {5,9}, {5,8}) and any of its superset 
such as {4, 5, 8, 9}, since the set of nodes connected in G to {4} and {8} are 
{6} and {7}, respectively, and (6, 7) G E. As a consequence, the interaction 
among Iq = {4,5,8,9} of Figure [5] is not identifiable for j3 in Q, which is 
so constructed. First find all Vq sets, which in this case are: for Vq = {6}, 
then 7 {6} = {8,9} and for Vq = {7}, then I {7} = {4,5}. The unidentified 
subspace is therefore the following: 
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Figure 4: The complementary graph G° of Example 3 



{ 



£{0,6} + £{0,6,8} + £{0,6,9} + £{0,6,8,9} = 
£{0,7} + £{0,4,7} + £{0,5,7} + ^{0,4,5,7} = 



The above examples show how to determine the expression of the (sub) space 
where identifiability breaks down. The following situations may arise: 

1. if condition (i) of Proposition 3 does not hold there is no (3° G SI such 
that the model is locally identified; 

2. if condition (i) of Proposition 3 does hold and condition (ii) of Proposi- 
tion 3 fails, i.e. there is (at least) a complete subset in G° admitting no 
identifying sequence, the model is locally identified everywhere except 
in the subspace flj of zero measure for any complete subset Iq G G° 
admitting no identifying sequence, with Qj Q given by the following 
expression: 



{ £{o,r} + E/c/o <K r > / )£{o,/,r} = for any r £ bd ao (I )- 

where 6(r,I) = 1 if {r,I} is complete in G° and otherwise, and 
bdQo(I ) denotes the boundary of I in G°. 
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Figure 5: The complementary graph G° of Example 4 



Example 5. In Figure [6] the complementary graph G° associated to an 
unidentified model is presented. The condition (ii) of Proposition 3 does not 
hold for {1, 3, 7} and {3, 4, 6, 7}: in fact, in G° node 3 is connected only to 
node 5 and node 7 is connected only to node 2, with (2, 5) E E. By adding 
the edge (3, 7) in the missing graph G , we get a local identified model. 

4 Extensions 

In this section we extend the condition for local identification to more general 
models to include (a) observed variables with a generic number of levels and 
(b) observed variables that are not connected to the unobserved one. The 
proofs are in the Appendix. The first extension leads to the following: 

Theorem 1. Consider an undirected graphical model G K over discrete 
variables (Aq,A\, . . . ,A n ), with Aq unobserved binary variable and (0,u) E 
E, for all u E O. A necessary and sufficient condition for the rank of the 
parametrization of G K to be full everywhere is that conditions (i) and (ii) 
of Proposition 3 hold. 

As might be anticipated, the graphical model corresponding to Figure 
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Figure 6: The complementary graph G° of Example 5 

[I] is locally identified. Let T\ be a non-empty set of observed variables not 
connected to the unobserved one. The idea is to regard the derivative matrix 
of Proposition 3 as the derivative matrix of this larger model for the rows 
of [iy associated to cells with value zero of all variables in T\. 

Theorem 2. Consider an undirected graphical model G K over discrete 
variables {Aq,A\, . . . ,A n ), with Aq unobserved binary variable and (0, it) G 
E, for all u G K \ {0 U Ti}. Let S = {K \ T±}. A necessary and sufficient 
condition for the rank of the parametrization to be full everywhere is that 
the subgraph G s satisfies the condition of Proposition 3. 

5 Concluding remarks 

One of the issues in estimating graphical models with hidden nodes concerns 
identifiability. In this paper conditions for the identification of discrete undi- 
rected graphical models with one hidden binary variable have been deter- 
mined. For conditionally full rank model, the expression of the unidentified 
space has been determined, permitting a reparametrization to achieve local 
identifiability. 

The derivations here presented could also be obtained using algebraic 
techniques, which prove to be an effective tool to study singularities in the 
parameter space. However, they do not directly provide the interpretation 
in terms of the associated conditional independence graph, which we believe 
is the added value of the paper. 

Issues of identification of all models that are obtainable as a one to one 
reparametrization of the discrete undirected graphical model can be ad- 
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dressed using the results presented in the paper. Extensions to models with 
more than one latent variable can be addressed exploiting the factorization 
dictated by the graph. This can be done immediately, provided that the 
hidden nodes in the graph are isolated. Extensions to models with hidden 
variables having more than two levels are not straightforward, as the simple 
structure of the matrix D(/3), as outlined in ([2]), does not hold any more. 



6 Appendix 

Proof of Theorem 1: First assume that all the variables are binary except 
the Ai variable which has three levels. Partition f3 into three subsets f3 a = 
{{i, /3o}, (3 b corresponding to the non-zero interaction terms of any order 
for value in {0, 1} of the observed variables and /3 C containing all other 
parameters. After ordering in a way such that the A\ variable is running 
the slowest, the D((3) matrix has the following structure: 



D*{p a ) 



D((3 b 







2("- 1 )x\/3 b \ 



®2 n x\(3 c \ 

D*(p c ) 



where [D((3 a ) \ D{j5 b )\ is the sub-matrix of the derivatives of f3 a and f5 b and 
has full rank if conditions (i) and (ii) of Proposition 3 hold. Note that 
by construction, D*(f3 c ) has a similar structure of the sub-matrix of D(f3 b ) 
formed by the last 2( n_1 ) rows and all columns. Therefore D*(f3 c ) is full rank 
if conditions {i) and (ii) of Proposition 3 hold. To see the necessity note 
that D({3 b ) is full rank only if Proposition 3 is verified. Proof of the theorem 
for Ai having l v levels follows straightforwardly. By a similar argument, 
extension to a generic number of levels of the Ai variables, i £ O, follows. 



❖ 



Proof of Theorem 2: Note that T\ is the set of observed variables 
such that (i, O) ^ E. We first focus on models with only binary variables. 
Let Ti C S'\{0} be the set of observed variables such that £ E, i E T\, 
j G T2. If T\ or T2 is empty the proof is trivial. To start with we assume 
|Ti| = 1. Partition /3 into the subsets (3 d containing all the the non-zero 
interaction terms among the variables in S and (3 e containing all the other 
elements. The non-zero interaction terms among the latent variable and the 
observed variables are in f3 e . The matrix D(f3)' has the following structure: 
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F D(B e ) 



D{f3 d )' and D(f3 e )' are the derivative sub-matrices for the corresponding 
elements; F is a sub-matrix with the same number of rows of D(/3 d ). The 
sub-matrix D{f3 e ) is full rank because it corresponds to the rank of the 
design matrix of the model for T\ U T 2 . The sufficiency follows easily from 
the block-diagonality of the matrix. The necessity follows from the fact that 
F has full rank if and only if D(f3 d ) has full rank. Extension to a generic 
number of variables in T\ follows after noting that the matrix D(f3) is so 
built: 



D(f3 d ) 2]s] -i xm 
F* D(f3 e ) 



where D(f3 e ) is the derivative sub-matrix for the vector f3 e defined as in the 
previous step. D((3 d ) is the derivative sub-matrix for the vector (3 d = /3\/3 e ; 
F* is a sub-matrix with the same number of rows as D(f3 e ). The same 
considerations as in the previous case hold. Extension to a generic number 
of levels of the Ai variables, i G O, follows by induction. 
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